Comparative analysis of mitogenomes among three species of grasshoppers (Orthoptera: Acridoidea: Gomphocerinae) and their phylogenetic implications

Whole mitochondrial genomes have been widely used in phylogenetic analysis, population genetics and biogeography studies. This study sequenced and characterized three complete mitochondrial genomes (Dasyhippus peipingensis, Myrmeleotettix palpalis, Aeropedellus prominemarginis) and determined their phylogenetic position in Acrididae. The length of the mitochondrial genomes ranged from 15,621–15,629 bp and composed of 13 PCGs, 2 rRNA, 22 tRNA genes and an AT control region. The arrangement and structure of the mitochondrial genomes were similar to those of other invertebrates. Comparative genomics revealed that the three mitochondrial genomes were highly conserved in terms of gene size, structure, and codon usage, all PCGs were purified selections with an ATN start codon and a TAN stop codon. All tRNAs could be folded into the typical clover-leaf structure, except tRNA Ser (AGN) that lacked a dihydrouridine (DHU) arm. Phylogenetic analysis based on 13 PCGs of 34 Acrididae species and seven outgroup species revealed that differences in the shape of antennae within the family Acrididae should be given less weight as a taxonomic character for higher-level classification. Moreover, the divergence time estimates indicates that in Gomphocerinae, the species with clubbed antennae were formed within the nearest 18 Mya, and Pacris xizangensis is more ancient.


INTRODUCTION
Insects originated about 479 Mya in the Early Ordovician, and a pair of antennae can be found in the evolutionary lineage, which have evolved into various shapes during the subsequent evolution (Misof et al., 2014).Antennae are important structures for receiving and transmitting information, sensing chemical odors, humidity, and temperature of the external environment, and playing an important role in finding hosts, mating, and defense (Lan, Xiang & Zhu, 2023).The shape of antennae, the number of segments, the location of antennae and sensilla on the surface of antennae vary with different insect species.
As a major consumers of plants, grasshoppers play a crucial role in the functioning of global ecosystems, are an important component of food chains, and represent important agricultural pests (Song et al., 2018;Hawlitschek et al., 2017;Naz et al., 2020).Therefore, accurate identification of species is important for pest control, and requires traditional morphological methods and further verification from a molecular perspective.With the development of sequencing technology, molecular phylogenetic can use the differences in DNA sequences between species at the genetic level to further explore species identification and phylogenetic relationships.
In this research, complete mitogenomes of Dasyhippus peipingensis, Myrmeleotettix palpalis, and Aeropedellus prominemarginis were sequenced and analyzed.We compared the genomic organization and composition with other Gomphocerinae species and established its phylogenetic position in Acrididae using two different methods.We also constructed divergent time trees using BEAST to estimate the divergence times of these species in Acrididae.The results also provide contribute to further understanding in taxonomy and phylogeny evolution of the Acrididae.

Sample collection and DNA extraction
The samples used in this study are shown in Table 1.Specimens were preserved in 95% alcohol and then transferred to 4 C for cryopreservation.We performed an accurate identification under stereomicroscope based on its morphological characteristics by Fauna Sinica (Yin & Xia, 2003).Using the Animal Tissues/Cells Genomic DNA Extraction Kit (Solarbio, Beijing, China) and according to the manufacturer's instructions, total genomic DNA was extracted from the hind femora muscle.The quality of the total DNA was checked with a 1% agarose gel and the concentration was measured with a Nanodrop 2000 spectrophotometer.DNAs were stored at −20 C for long term storage and further molecular analyses.

DNA sequencing and assembly
Complete mitochondrial genomes were sequenced using the Illumina Novaseq 6000 platform with 151 bp paired end reads at Personalbio, Shanghai, China.Using the invertebrate genetic code in Genious 8.1.3(Kearse et al., 2012), by using the closely related known grasshoppers as reference sequences, complete mitochondrial genomes of Dasyhippus peipingensis, Myrmeleotettix palpalis, and Aeropedellus prominemarginis were assembled and aligned by using ClustalW (Larkin et al., 2007).Complete mitochondrial genome sequences were manually proofread in Genious 8.1.3to check the accuracy of the assembly.
Divergence time in Aricidiae was estimated using the 13 PCGs with relaxed molecular clock model in BEAST 1.10.4(Drummond et al., 2012).Coalescent: Constant Size model was used for the prior tree, ModelFinder was used to find the best model GTR+G+F4.Divergence time tree nodes from Chang et al. (2020) were used for calibration (separation time 115 Mya for Tetrigoidea; 71 Mya for Chrotogonidae; 56 Mya for Pamphagidae; 35 Mya for Catantopinae; 33 Mya for Oedipodinae).The Markov chain was run 100,000,000 generations, sampling every 10,000 generations, 25% was burn in.The stability of the results was verified by Tracer v1.7.2 with most parameters having more than 200 effective sample size (ESS) values.ChiPlot (https://www.chiplot.online/tvbot.html) was used to visualize maximum clade credibility tree with 95% highest probability density (95% HPD).

Genome content and organization
We sequenced and annotated the whole mitochondrial genomes which performed visual editing.The complete mitogenome sequence of Dasyhippus peipingensis, Myrmeleotettix palpalis, and Aeropedellus prominemarginis were 15,628, 15,621 and 15,629 bp, respectively.These mitogenomes showed typical insect mitogenome structure, which composed of circular double-stranded DNA molecules (Fig. 1).Each mitogenome includes 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes and an A+T rich region (control region).There were 23 genes (including nine PCGs and 14 tRNAs) are encoded on majority-strand (J-strand) and 14 genes (including four PCGs, eight tRNAs and two rRNAs) are transcribed from the minority-stand (N-strand) (Table 3).
The nucleotide compositions of the three mitochondrial genomes revealed a distinct A/T bias: 75.7% (Dasyhippus peipingensis), 75.2% (Myrmeleotettix palpalis), and 75.0%(Aeropedellus prominemarginis).All mitochondrial genomes were positive for A+T skew and negative for GC skew (Table 4).The complete mitochondrial genomes and PCGs of three grasshopper species had A+T contents higher than 64% at different compositional sites and locations.The control regions (A+T-rich region) of the mitochondrial genomes were all located between tRNA-Ile and rrnS, with sizes of 729 bp (Dasyhippus peipingensis),     2023), PeerJ, DOI 10.7717/peerj.165509/25 A+T content was >83%, which were also referred to as AT-rich regions.The structures and nucleotide compositions of the three species are generally consistent with the mitochondrial genome structure of the Acrididae (Zhang et al., 2023;Zheng et al., 2021), indicating that these mitochondrial genome's structure is highly conserved (Wei et al., 2010).
The total lengths of the 13 PCGs of Dasyhippus peipingensis, Myrmeleotettix palpalis and Aeropedellus prominemarginis are 11,190,11,193 and 11,190 bp,respectively (Table 4), accounting for 71.6%, 71.7%, and 71.6% of the whole mitochondrial genome, respectively.The size of the PCGs ranges from 162 bp (ATP8) to 1,719 bp (ND5).Among the 13 PCGs, The third codon position has the highest A+T content, while the second codon position has the lowest A+T content.All the initiation codons in the mitogenomes of the three species were ATN, with ATG being the most frequently used, termination codons were TAN, and TAA was the most frequently used termination codon.
To indicate the frequency of codon usage, the relative synonymous codon usage (RSCU) values of the three mitochondrial genomes were visualized (Fig. 2; Table S3).Comparative analysis showed that the synonymous codon preferences were highly conserved among three mitochondrial genomes.The most frequently used codons are TTT, TTA, ATT, and ATA, therefore, Phe, Leu (UUR), IIe, and Met are most frequently used amino acids, accounting for 7.86%, 9.67%, 9.31%, and 5.81% of total, respectively.In addition, RSCU analysis also showed a bias towards using more A/T at the third codon position rather than G/C.Similarly, the frequency of codon usage indicates the preference of nucleotide A/T in three species.
As other insects, the mitochondrial genome of three species contain 22 tRNA genes with lengths ranging from 62-71 bp (Table 3), the total length of tRNAs is ranging from 1,475 bp (Dasyhippus peipingensis) to 1,476 bp (Myrmeleotettix palpalis and Aeropedellus prominemarginis).The A+T content of tRNAs is 73.2%, 72.6%, and 72.7% for three mitochondrial genomes, with positive AT skew and GC skew.Most tRNAs could be folded into the typical clover leaf secondary structure, except that tRNA-Ser (AGN) lacked a dihydrouridine (DHU) arm and formed a simple loop (Fig. 3).The secondary structure of tRNAs is usually conserved in the amino acid acceptor arm and anticodon loop, while DHU and TψC are more variable.In addition to the classic base pairs A-U and C-G, there are also noncanonical base pairings (G-U and A-C) and mismatched base pairs (A-A and A-G) distributed throughout the tRNA arms, with G-U noncanonical base pairs being the most abundant.
The two ribosomal RNA genes are encoded on the N-strand among three grasshoppers (Table 3), rrnL is located between tRNA-Leu (CUN) and tRNA-Val, while rrnS is flanked by tRNA-Val and A+T rich regions.The rrnL of Dasyhippus peipingensis, Myrmeleotettix palpalis and Aeropedellus prominemarginis are 1,319, 1,316 and 1,312 bp in length, contains the A+T content ranging from 77% to 78%.The rrnS is 843bp in Dasyhippus peipingensis and Myrmeleotettix palpalis, 844 bp in Aeropedellus prominemarginis, with AT content ranging from 73.6% to 75.5%.Therefore, there were no significant differences in rRNAs among three species.Both rrnL and rrnS exhibit negative AT-skew and positive GC-skew in three mitogenomes.
Nucleotide diversity analysis can identify regions with large nucleotide divergence, which is useful for designing species-specific markers in groups within taxa where morphological identification is difficult and taxonomic boundaries are blurred (Ma et al., 2019;Xie et al., 2011;Yuan et al., 2022).The nucleotide diversity (Pi values) of 10 species was analyzed by sliding window analysis (Fig. 4).In the mitochondrial genomes of the 10 gomphocerine species, nucleotide diversity is highly variable.Nucleotide diversity ranges from 0.08 to 0.132, with higher nucleotide diversity in genes ND2, ND6, and ATP8, which are 0.132, 0.121, and 0.115, respectively.In contrast, the nucleotide diversity of COX1, ND1, and ND4L is lower, with 0.094, 0.084, and 0.080, respectively.This indicates that COX1, ND1, and ND4L are relatively conserved genes.Ka/Ks indicates the ratio between the non-synonymous substitution rate (Ka) and the synonymous substitution rate (Ks) of two protein-coding genes, which can be used as an important marker to estimate the evolutionary rate.We calculated the Ka/Ks values of the mitochondrial genomes among 10 species of gomphocerine (Fig. 5; Table S4).The Ka/Ks values of all PCGs were less than one, indicating that these genes evolved under purifying selection and were evolutionarily conserved in the mitochondrial genome.ND6 had the highest Ka/Ks value, followed by ATP8 and ND5 and COX1 had minimum Ka/Ks value (Ka/Ks = 0.058) and low evolutionary rate, indicating that the COX1 gene had strong purifying selection and evolutionary conservation, which could be used as an important marker to identify relatedness among species, therefore, a partial fragment of COX1 is often used as DNA barcodes for inferring species phylogenetic relationships (Hebert, Ratnasingham & deWaard, 2003).In contrast, ND6 had the highest Ka/Ks value (Ka/Ks = 0.293), showing a faster evolutionary rate with less selection pressure in PCGs, which undergone relatively weak purifying selection.It can be used to assess intraspecific relationships and is more suitable as a potential molecular marker in population genetics (Yuan et al., 2021;Pu et al., 2022;Chen et al., 2021).

Phylogenetic analysis
The heterogeneity sequence divergence of the two matrixes PCG123 and PCG12 was assessed (Fig. 6), both indicated that the mitochondrial genomes of gomphocerine species showed lower heterogeneity (the similar scores for pairwise sequence comparisons were the lowest).Furthermore, among the species in Gomphocerinae, Orinhippus tibetanus shows higher heterogeneity than others.Substitution saturation of PCGs of 41 sequences were tested, and Xia's analysis showed Iss < Iss.c and p < 0.05, revealed that base substitutions had not saturated and phylogenetic analysis could be performed.We investigated the phylogenetic position of species among Gomphocerinae within Acrididae (Fig. 7).Tree topologies were consistent from both BI and ML analysis with high bootstrap values (BS) and bayesian posterior probability values (PP) in most clades.In Gomphocerinae, except for Orinhippus tibetanus, all other species showed significant monophyly (PP = 100, BS = 1).Orinhippus tibetanus and the species of Oedipodinae are recovered as sister groups, which is greatly supported (PPs = 1, BSs = 100).The results further support the reclassification of Orinhippus tibetanus as a member of the Oedipodinae using molecular systematics by Gao, Chen & Jiang (2017).In addition, by means of species descriptions of the genus and type species of the Orinhippus, previous research has classified the genus as a member of Locustinae (=Oedipodinae) (Uvarov, 1921;Bei-Bienko & Mishchenko, 1951), our results corroborate this ancient view from a molecular point of view.Therefore, we believe that antennal structure is of lesser importance as a higher order taxonomic character for grasshoppers.Moreover, this study clarifies the phylogenetic status of the genus Aeropedellus in Gomphocerinae, which is recovered a sister group with the genus Dasyhippus.Phylogenetic analysis showed that the filiform antennae tend to be ancestral, while the condensed and expanded antennal flagellum which ends gradually to form a clubbed shape, appears to be more evolutionarily advanced (Fig. 8).This may first occur in microclub-shape antennae of plateau species Pacris xizangensis, and the subsequently differentiated genera Myrmeleotettix and Aeropedellus show similar antennal morphology, in which the male antennal ends are slightly expanded at 5-7 segments, forming antennal ends that are twice as wide as long.Starting from Dasyhippus, the male antennal ends are very expanded at 7-8 segments, forming antennal ends that are significantly wider than Convergent evolution refers to independent lineages evolving similar phenotypes under similar selective pressures (Fraser & Whiting, 2019), but the phenomenon of convergent evolution is not easy to identify in evolution.Both Orinhippus tibetanus and Pacris xizangensis are distributed in Tibet, with very similar altitudes and environmental factors in their habitat, both species are classified in Gomphocerinae.However, based on phylogenetic analysis using mitochondrial genomes, Orinhippus tibetanus and Pacris xizangensis are on two independent clades.This study speculated that Orinhippus tibetanus and Pacris xizangensis may have been subjected to similar environmental selection pressures that formed similar antennal morphology convergently.
The phylogenetic relationships among Melanoplinae, Catantopinae, and Oxyinae are unclear in this tree, which may be the result of incomplete sampling.However, since they are not the focus of this research, no further discussion has been conducted.The low confidence level may be due to insufficient sampling, and in further studies, a wide range of sampling and multiple methods may be used to explore the phylogenetic relationships of these subfamily units in Acrididae.Further understanding of the triggering factors for evolution and the convergence of ecological forms in entire tree of life may help clarify genomic constraints and historical contingencies that have led to convergent evolution.According to the BEAST analysis, the formation of club shape antennae occurred approximately 18 Mya in Pacris, after which there were two independent evolutionary events, resulting in extreme enlargement of the antennal ends (Gomphocerus and Dasyhippus) and slight enlargement (Myrmeleotettix and Aeropedellus).The clubbed shape antennae are an important taxonomic characteristic at the family level, which is occurred only in species of the Gomphocerinae (Yin, 1982;Yin & Xia, 2003).However, the divergence times indicate that the earliest clade of the gomphocerine grasshoppers, Pacris xizangensis, diverged at approximately 18 Mya (15-23 Mya, 95% HPD).Compared with the divergence times of subfamilies such as Oedipodinae, Acridinae, and Melanoplinae, the clubbed antennae grasshoppers have a relatively brief divergence history, may not to reach the family category, which confirms the current classification system that places them in Gomphocerinae under the tribe Gomphocerini (Otte, 1981).The results of the divergence time estimates reconfirmed that antennal morphology should be given less weight as a taxonomic character for grasshoppers higher-level classification.It also suggests that species within the subfamily Gomphocerinae are not monophyletic.
Limitations in selecting gene fragments, lateral gene transfer during early evolution, or ancestral polymorphisms may result in phylogenetic incongruence between morphological and molecular data during speciation events.Feng et al. (2022) suggested that constructing a species relationship tree based on only partial genes and phenotypes may not be reliable, and genome-wide data is the gold standard for reconstructing the evolutionary history of species.However, the genome of grasshoppers is significantly larger than other insects (Alfsnes, Leinaas & Hessen, 2017;Husemann et al., 2020), whole genome sequencing is expensive and it is difficult to analyze the data.Therefore, it is often more practical to select more conservative molecular markers to explore the true evolutionary relationships among species in Acrididae.In addition, incomplete lineage sorting and convergent evolution may also cause contradictions between morphology and molecular data.The taxonomic category of clubbed antennae grasshoppers in Acrididae requires deeper investigation using larger scale sampling.

CONCLUSIONS
The mitochondrial genomes of three Gomphocerinae species, Dasyhippus peipingensis, Myrmeleotettix palpalis and Aeropedellus prominemarginis were sequenced, annotated, and analyzed.The results demonstrated that size and structure of the mitochondrial genomes in the three species were conservative and identical to others in Acrididae.The nucleotide composition of three species showed a strong AT bias in mitochondrial genome.The codon usage of protein-coding genes was highly conserved, except for tRNA-Ser (AGN), which lacks a dihydrouridine (DHU) arm, all other tRNAs could fold into a typical cloverleaf structure.There was no significant difference in the size of rRNAs among three species.The Ka/Ks values of all PCGs were <1, indicating that these genes evolved under purifying selection.
This study used complete mitochondrial genomes to explore the phylogenetic relationships among several grasshoppers within Gomphocerinae and determined the phylogenetic status of the genus Aeropedellus.The results provide new and important information about the classification of Gomphocerinae.In Acrididae, differences in antennal shape should be given less weight as a taxonomic character for higher-level classification.To deeply explore the phylogenetic relationships among grasshoppers, increased sampling of taxa and selection of multiple genes is needed to reconstruct more comprehensive phylogenetic relationships.The data will be included in the Supplemental Files due to its unavailability for public disclosure.If the publication is completed, the data will be made publicly available.

Figure 3
Figure 3 Twenty-two tRNA secondary structures identified in the mitochondrial genome of the Aeropedellus prominemarginis.Base pairing and mismatches indicated by (−) and (+), respectively.The variable sites in the mitochondrial genomes of the other two locust species are represented by red circles.Full-size  DOI: 10.7717/peerj.16550/fig-3

Figure 4 Figure 5
Figure 4 10 gomphocerine species PCGs nucleotide diversity.The blue curve represents the value of nucleotide diversity (Pi), and the average value of each gene is displayed below the gene name.Full-size  DOI: 10.7717/peerj.16550/fig-4

Figure 6
Figure 6 Mitochondrial genome sequences heterogeneity differences in the matrix PCG123 and PCG12.The average similarity score between the sequences is represented by a colored square.The AliGROOVE score ranges from −1 to +1 (the red part indicates significant heterogeneity, and the dark part indicates significant homogeneity).Full-size  DOI: 10.7717/peerj.16550/fig-6

Figure 7
Figure 7 Phylogenetic evolution inferred from the BI and ML of the PCG123 matrix.The numbers on the branches represent the posterior probability (PP)/bootstrap value (BS).Full-size  DOI: 10.7717/peerj.16550/fig-7

Figure 9
Figure 9 Divergence time tree.The blue bars represent the 95% confidence intervals of the estimated time, and the numbers on the nodes indicate the mean divergence time.Full-size  DOI: 10.7717/peerj.16550/fig-9

Table 2
Summary of mitogenomes used in this study.

Table 3
List of total size and intergenic nucleotides for mitochondrial genes with lengths of genes, anticodons of tRNAs and start/stop codons of protein-coding genes.

Table 3
N and J indicate that the gene was located in the minor (N) and major (J) strand."/.": same as the one previous to it.The species are as follows: Dasyhippus peipingensis; Myrmeleotettix palpalis; Aeropedellus prominemarginis. Note:

Table 4
Nucleotide composition of three mitochondrial whole genomes.